-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix an issue with one_level_list schemas in parquet reader. #10750
Fix an issue with one_level_list schemas in parquet reader. #10750
Conversation
…nformation to propagate between columns, causing crashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining it to me, @nvdbaranec.
I realize there isn't a good way to test the readers without checking in the parquet file itself. :/ |
rerun tests |
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #10750 +/- ##
================================================
+ Coverage 86.36% 86.43% +0.06%
================================================
Files 142 143 +1
Lines 22302 22444 +142
================================================
+ Hits 19261 19399 +138
- Misses 3041 3045 +4
Continue to review full report at Codecov.
|
@gpucibot merge |
Partially addresses: #10733
For a particular way of encoding list schemas (an old way that Spark seems to use sometimes), the parquet reader was accidentally propagating incorrect nesting information between columns. Just a simple bug of not popping an extra value off a stack.
Note: this is simply a fix so that the files read correctly, however the internal data in the file is actually of binary type and cudf converts these to string columns. This PR does not add support for binary as a real type in cudf.